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Abstract. Based on several empirical evidence, a series of papers has 
advocated the concept that seismicity prior to a large earthquake can be 
understood in terms of the statistical physics of a critical phase transition. 
In this model, the cumulative seismic Benioff strain release e increases as a 
power-law time-to-failure before the final event. This power law reflects a 
kind of scale invariance with respect to the distance to the critical point: e 
is the same up to a simple rescaling A 2 after the time-to-failure has been 
scaled by a factor A. A few years ago, on the basis of a fit of the cumulative 
Benioff strain released prior to the 1989 Loma Prieta earthquake, Sornette 
and Sammis [1995] proposed that this scale invariance could be partially 
broken into a discrete scale invariance, defined such that the scale invariance 
occurs only with respect to specific integer powers of a fundamental scale 
ratio. The observable consequence of discrete scale invariance takes the form 
of log-periodic oscillations decorating the accelerating power law. They found 
that the quality of the fit and the predicted time of the event are significantly 
improved by the introduction of log-periodicity. Here, we present a battery 
of synthetic tests performed to quantify the statistical significance of this 
claim. We put special attention to the definition of synthetic tests that are 
as much as possible identical to the real time series except for the property 
to be tested, namely log-periodicity. Without this precaution, we would 
conclude that the existence of log-periodicity in the Loma Prieta cumulative 
Benioff strain is highly statistically significant. In contrast, we find that 
log-periodic oscillations with frequency and regularity similar to those of the 
Loma Prieta case are very likely to be generated by the interplay of the low 
pass filtering step due to the construction of cumulative functions together 
with the approximate power law acceleration. Thus, the single Loma Prieta 
case alone cannot support the initial claim and additional cases and further 
study are needed to increase the signal-to-noise ratio if any. The present study 
will be a useful methodological benchmark for future testing of additional 
events when the methodology and data to construct reliable Benioff strain 
function become available. 
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1. Introduction 

The idea that earthquakes are somewhat analogous 
to critical phenomena of statistical mechanics has been 
gaining ground in the last few decades [Chelidze, 1982; 
Allegre et al, 1982; Sornette and Sornette, 1990; Tu- 
markin and Shnirman, 1992; Sornette and Sammis, 
1995; Newman et al, 1995; Bowman et al, 1998; Jaume 
and Sykes, 1999]. One of the consequences of this 
new point of view is that events occurring even several 
decades before a large main shock can be considered as 
seismic precursors, and that a study of this precursory 
scismicity might give a fairly good indication of when 
the impending major earthquake will take place, and 
how big it is going to be. Although such considerations 
are still in infancy, and the usual caveats about earth- 
quake prediction must be kept in one's mind, a lot of 
work has already been devoted to "post-diction" , some- 
times with impressive success. One of the pioneering 
cases in that direction was the 1989 Loma Prieta earth- 
quake, where Sornette and Sammis [1995] proposed to 
see the empirical power law used by Bufe and Varnes 
[1993] in the perspective of criticality in the sense of 
statistical physics [Sornette, 2000]. They found that 
the cumulative Benioff strain starting about 50 years 
ago could be well-fitted with a power law, giving rise 
to a post-diction for the main event of (1990.3 ± 4.1), a 
reasonably satisfactory result. 

Things got even more exciting after the paper of Sor- 
nette and Sammis [1995] where it was pointed out that 
the strong oscillations around the power law could be 
fitted as well using a complex exponent correction to 
scaling: 

e(t) = A + B(tf-t) z {l + Ccos[u>log(t f -t) + <i>]}. (1) 

In this formula, the parameter tf is the time of the 
main shock (a pure power law would correspond to 
C = 0), and the best fit gave rise to an estimate 
tf = 1989.9 ± 0.8, considerably closer to the real date 
than the one in [Bufe and Varnes, 1993], and with much 
less uncertainty. 

The existence of complex correction to scaling ex- 
ponents could be linked to an underlying discrete scale 
invariance, a very appealing property from a theoretical 
point of view: the initial observation in [Sornette and 
Sammis, 1995] therefore spurred a lot of development 
[Saleur et al, 1996a, b; Johansen et al, 1996; Huang 
et al, 1998; Sornette, 1998; Bowman et al, 1998; Jo- 
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hansen et al, 2000; Sammis and Smith, 1999; Jaume 
and Sykes, 1999]. 

Although the quality of the fit in Sornette and Sam- 
mis [1995] is, to the naked eye, impressively good, 
the suspicion arose recently that the oscillations in the 
Loma Prieta Benioff strain could be merely the result of 
noise. The synthetic tests performed in [Sornette and 
Sammis, 1995] being somewhat incomplete, we have de- 
cided to reanalyze this question much more carefully in 
the present work. 

We have performed two types of synthetic tests to do 
this re-analysis. 

In the first type, we consider random power laws (ex- 
plained in Section 3), with parameters that match those 
of the real data, and study whether noise can give rise 
to log-periodic structures after integration. The advan- 
tage of this approach is that the two key ingredients of 
the analysis of the real data (power law and integration) 
are captured in a simple way. Its drawback is that the 
synthetic data (a sequence of random numbers drawn 
from a power law probability distribution, from which 
a cumulative quantity is constructed) is not quite of the 
same nature than the real data (a sequence of times and 
magnitudes, from which the cumulative Benioff strain 
is constructed). In particular, the sampling for the syn- 
thetic data is essentially periodic in log scale, and this 
effect, combined with integration, is expected to give 
rise to spurious log-periodic oscillations indeed. Nev- 
ertheless, we find the consideration of these synthetic 
tests quite useful, as it complement the considerations 
in [Huang et al, 2000]. 

The second type of synthetic tests is devised es- 
pecially to avoid this issue of sampling: we generate 
data for both time and magnitude, in such a way that 
the probability distributions of the synthetic (t s ,m s ) 
and the real (t r ,m r ) quantities are the same. There 
is a problem in doing so: although the synthetic data 
and the real data have the same distribution, there is 
no guarantee that the cumulative Benioff strain con- 
structed form the synthetic sequences is really a power 
law, because a power law dependence involves higher- 
order statistics (i.e., correlation and dependence) not 
captured by the one-point distribution functions. To 
preserve the feature of the real data that events are 
more frequent and with higher magnitudes when closer 
to the main shock (or the last data point), we added 
a reordering procedure which shuffles the synthetic se- 
quences in such a way that the event with the j mag- 
nitude is at the same position as the real event with the 
j th magnitude in the real sequence. 

We find that for both type of synthetic tests, it is, 
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surprisingly, highly possible to get spurious (that is, en- 
tirely due to noise) log-periodic oscillations which are 
as good as those observed in the real Loma Prieta data. 
This conclusion is made quantitative in a variety of 
ways, in particular by studying the highest peak of the 
spectrum of oscillations around the power law for the 
real data, and building the probability distribution of 
such peaks for synthetic data. We thus conclude that, 
at the present time, it is not possible to distinguish the 
log-periodic oscillations observed in [Sornette and Sam- 
mis, 1995] from noise. 

The present study is related to [Huang et ai, 2000]. 
The common theme is the investigation of the condi- 
tions under which log-periodicity can be created spon- 
taneously by noise. In [Huang et at, 2000], the goal 
is to study in details the underlying mechanism, re- 
lying solely on the manipulation of data: the gener- 
ally found non-uniform sampling together with a low 
pass filtering step, as occurs in constructing cumulative 
functions, in maximum likelihood estimations and de- 
trending, is enough to create apparent log-periodicity. 
A detailed exploration of this mechanism has been of- 
fered in [Huang et ai, 2000] together with extensive 
numerical simulations to demonstrate all its main prop- 
erties. It was shown that this "synthetic" scenario for 
log-periodicity relies on two steps: 1) the fact that ap- 
proximately logarithmic sampling in time corresponds 
to uniform sampling in the logarithm of time; 2) integra- 
tion reddens the noise and, in a finite sample, creates 
a maximum in the spectrum leading to a most prob- 
able frequency in the logarithm of time. In [Huang 
et ai, 2000], this insight was then use to to analyze 
the 27 best aftershock sequences studied by [Kisslinger 
and Jones, 1991] and search for traces of genuine log- 
periodic corrections to Omori's law, which states that 
the earthquake rate decays approximately as the inverse 
of the time since the last main shock. The observed log- 
periodicity was shown to almost entirely result from the 
"synthetic scenario" due to the data analysis. From 
a statistical point of view, resolving the issue of the 
possible existence of log-periodicity in aftershocks will 
be very difficult as Omori's law describes a point pro- 
cess with a uniform sampling in the logarithm of the 
time. By construction, strong log-periodic fluctuations 
are thus created by this logarithmic sampling. In con- 
trast, in the present paper, we apply the insight ob- 
tained in [Huang et ai, 2000], to study accelerated 
power laws culminating in a finite-time singularity at 
time tf. 

To be complete, we should also point out the fol- 
lowing. Sornette and Sammis [1995] paper contains a 



forward prediction of an earthquake in the Komman- 
dorski Island region at time tf — 1996.3 ± 1.1 year. 
Forward predictions provide a much larger statistical 
significance since the model parameters are estimated 
independently and outside their domain of application 
(see below the discussion in the section on the analy- 
sis procedure). Forward prediction has also the qual- 
ity of increasing the number of cases. The prediction 
of a critical time is not enough, one must specify the 
magnitude of the predicted earthquake. In [Sornette 
and Sammis, 1995], the magnitude was not specified 
but can probably be taken following the specification 
of Bufe et al. [1994] of a magnitude in the range 7.5- 
8.5 occurring in a zone originally outlined by Nishenko 
[1991]. The largest earthquake in the Harvard catalog 
during the time period 1994-1998 has a moment mag- 
nitude M w = 6.6 (1996/07/16, 56.16N, 164.98E). The 
same event has the magnitude Ms = 6.4 in the PDE 
catalog. If the prediction is considered correct only if 
both its time and magnitude range is as predicted, this 
prediction is a failure. However, it is hard to draw any 
firm conclusion based on this single case with respect 
to usefulness of the critical earthquake concept and of 
log-periodicity as the methodology has evolved signifi- 
cantly since the initial paper of Sornette and Sammis 
[1995] (see for instance [Bowman et al, 1998; Ouillon 
and Sornette, 2000]). 

The plan of this paper is as follows. In Section 2 we 
present some general observations on synthetic tests and 
their interpretations. Details on our two types of tests, 
together with their results, are presented in Sections 3 
and 4. Our conclusions are collected in Section 5. 

2. Analysis Procedure 

Usually, the major problem in establishing the sta- 
tistical significance of a forecasting procedure is its ret- 
rospective character involving a limited number of data 
and a significant number of explicit (the parameters of 
the fit) and implicit (the total time and space windows 
used, etc.) degrees of freedom. In such situations, the 
calculation of statistical significance becomes very dif- 
ficult and uncertain as soon as the adjustable parame- 
ters are determined from the data. The conclusions can 
then be artifacts of the processing technique or of a se- 
lection bias. The paper of Sornette and Sammis [1995] 
certainly suffers from this problem. Since there are no 
general methods or techniques that would allow us to 
overcome these difficulties, we stick to a more modest 
approach, which turns out to be sufficient to draw a 
clear and meaningful conclusion. 
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We develop two types of synthetic tests. For both 
types, the key part is the comparison of the log-periodic 
oscillations in the real seismicity to those in the syn- 
thetic sequences. Similar oscillations should have simi- 
lar frequencies and similar regularities, and this can best 
be quantified by considering the spectrum of these os- 
cillations. We focus on the highest peak (characterized 
by angular frequency uj and peak height h) in the spec- 
trum, which quantifies the most significant frequency 
component in the oscillations as a function of log(i / — t) 
(since we are looking for log-periodicity). The Lomb 
method [Press et al, 1992] is used instead of the usual 
Fourier Transform, since the data points are not equidis- 
tantly spaced. 

The ultimate purpose of synthetic tests is to get 
the significance level (the probability of getting the 
same thing by accident) of the real observation. One 
natural way of evaluating the significance level would 
be, it seems, to count the number of synthetic peaks 
(defined as spectral components with spectral power 
higher than neighboring frequencies, in other words, 
local maximums in the spectrum) within the intervals 
lo s 6 [uj r — A, oj t + A] and h s > h r (superscript s, r de- 
note synthetic and real data respectively), and to nor- 
malize this by the total number of synthetic peaks. This 
might however lead to incorrect conclusions. For in- 
stance, suppose that, from 10000 synthetic peaks, only 
one peak with peak height higher than 10 (the height 
of the real peak) and angular frequency lo in the range 
of [tt, 2tt] was found: could we conclude that the sig- 
nificance level is 0.01% (or confidence level of 99.99%)? 
Probably not. To see why, suppose the distribution of 
lo's were uniform in [0, 40], and the distribution in peak 
height uniform in [6.862,10.004]: then the probability 
of observing one peak of height above 10 and lu S [tt, 2tt] 
would roughly be 1/10000. But this small probability 
does not mean that the peak of height above 10 and 
lu G [tt, 2tt] is highly significant! Due to the uniform 
distribution in both to and peak height, any other pair 
of (lu, h) would have, in fact, been observed with the 
same probability. To draw positive conclusions from 
such a simple counting analysis, one would need to have 
in advance a theory indicating where the peak should 
be found, and approximately at what height. It is not 
clear that there is such a thing at present, and therefore 
we prefer to use a more conservative approach. 

In the language of statistics, this question is related 
to the difference between first-order and second-order 
statistics. In first-order statistics, we ask: "what is the 
probability to observe the peak we see just by chance?" 
In second-order statistics, we ask: "what is the prob- 



ability to observe some (first-order significant) peak 
somewhere (whatever its position)?" In our context, 
the determination of the confidence level within first- 
order statistics requires that we have an a priori under- 
standing of where the peak should be found. 

In the absence of any theoretical predictions, it seems 
natural, to quantify the significance level of the log- 
periodic oscillations, to rely somehow on the probabil- 
ity distribution of lu and peak height in the synthetic 
samples, i.e. to rely on second-order statistics. We have 
not managed to come up with a totally satisfactory, ob- 
jective way to use this probability distribution however. 
A useful quantity we came up with-but it should not 
be trusted blindly-is the ratio R of the probability of 
observing a given peak to the probability of observing 
the most probable peak: if the ratio is close to 1, this 
surely indicates that the peak is not very significant. 

To obtain this ratio, one needs the probability den- 
sity function p(co, h) of the synthetic peaks: the latter 
can be constructed using the Kernel Density method 
[Silverman, 1986; Beardah, 1995]. We then set R = 
v {u™v\™v)dudh wnere (w r , h r ) characterize the real peak, 
and (co mp , h mp ) is the most probable synthetic peak. 
The ratio quantifies how frequently in synthetic se- 
quences we can observe log-periodicity similar to that 
observed in the real sequence. There are some technical 
advantages in using this ratio. For instance, there is in 
fact no arbitrariness in choosing the intervals diu and 
dh, since they cancel out between the numerator and 
the denominator: it is then enough to just use the func- 
tion value of p{u,h) at {uj r ,h r ) and (uj m P,h mp ). Also, 
the special choices (e.g., the degree of smoothing and 
the type of kernels) made in constructing the probabil- 
ity density function p(tu, h) hopefully also cancel out in 
the ratio, and have little influence on the final result. 

The generation of synthetic data and extraction of 
oscillations depend on the type of synthetic tests and 
will be explained separately in the following sections. 

3. Synthetic Test I 

3.1. Generation of Synthetic Data 

We start from a simple method which takes into ac- 
count the most crucial features of the real data. We 
then refine this method in several ways in Section 4. 

The crucial features of the real data are power law 
and integration. The latter-considering the cumula- 
tive Benioff strain-is necessary for numerical reasons, 
as there are not enough data points to study directly 
the rate at which energy is released (moreover, consid- 
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cring the rate leads to other difficulties, like the influ- 
ence of the binning intervals etc). We therefore take 
a pure power law as the null hypothesis, and generate 
data with a probability density fitting the power law 
part of de/dt: we mimic real data as closely as possi- 
ble, taking in particular the same number of points. We 
then construct a synthetic cumulative Benioff strain by 
numerical integration, and investigate whether noise in 
the sampling of the power law can give rise to spurious 
log-periodic oscillations. 
Again, the power law is 



de(t) 
dt 



(tf-ty 



(2) 



following (1) in [Sornette and Sammis, 1995]. We as- 
sume the range for t is [to, t{\ with t\ < tj to avoid the 
singularity at £/. After normalization, we have 



de(t) 
dt 



m 



(tf-t ) m -(t f -hy 



:(tf-t) 



(m-l) 



(3) 



For the real seismic precursors, t is the time of occur- 
rence of an earthquake, for synthetic events t is a ran- 
dom variable with probability function p(t) = de }p . To 
make sure the synthetic events and the real seismic pre- 
cursors have the same power law distribution, we chose 
the same parameters to, t\, tj, m, N (from Table I 
and Figure 1 of [Sornette and Sammis, 1995]), where 
N is the number of events. (Since the original data was 
not available at the time of this study, we retrieved 
data from the CNSS catalog using their space-time- 
magnitude window. However, due to some unknown 
reason, our data set was slightly different from their 
data set. We got only 27 events instead of 31.) The ran- 
dom variable t with given p(t) can be transformed from 
a random variable x uniformly distributed on [xq, Xi] 
by solving [Press et at, 1992] 

p(x)dx — p(t)dt. 

The transformation is 



t = t f -[(tf-t ) 

x — xo 



xi - xq 



{{t f t r (t f - 



■ (4) 



We then construct the cumulative distribution function 
of t and use this function to mimic the cumulative Be- 
nioff strain of the real sequence. They have the same 
power law parameters and they are both, indeed con- 
structed from integration. 



3.2. Extraction of Oscillations 

The original analysis procedure used in [Sornette and 
Sammis, 1995] was to fit the cumulative Benioff strain 
to a power law with log-periodic oscillations ((1), the 
same as (8) in [Sornette and Sammis, 1995]). The qual- 
ity of the observed log-periodicity was not quantified 
in [Sornette and Sammis, 1995] other than by showing 
that the quality of the fit measured by the residue as 
well as the predicted critical time t / were both substan- 
tially improved compared to those from the fit with the 
simple power law, but for our study, it is crucial to do 
so. 

To get the oscillations, we first fit a power law with 
log-periodic oscillations (1) to both the real and syn- 
thetic data. The pure power law part (obtained by set- 
ting C = in (1) is then subtracted, and we obtain the 
remaining oscillations. These oscillations are in turn 
analyzed using the procedure outlined in Section 2. 

3.3. Results 

3.3.1. The real sequence. We first characterize 
the log-periodicity observed in the real sequence. The 
fit of (1) to the real data is remarkable (Figure 1). The 
oscillations around the power law part show approxi- 
mately 2.5 cycles of regular oscillations (Figure 2), the 
spectrum of which has a peak near lo t ~ 6.1 with height 
h r ~ 7.5 (Figure 3), which is significantly different from 
Gaussian noise (the chance of observing such a peak 
from Gaussian noise of the same number of data points 
is less than 2% according to (13.8.7) of [Press et al., 
1992].). However, since we are dealing with oscilla- 
tions around a cumulative quantity, integrated Gaus- 
sian noise would be a more appropriate null hypothesis. 
We will study the chance of observing such a peak from 
integrated Gaussian noise in Section 3.3.2. 

3.3.2. Synthetic sequences. 300 synthetic 
sequences were generated using the parameters of the 
seismic precursors for the 1989 Loma Prieta earthquake. 
They were analyzed in the same way as the real precur- 
sor sequence. See Figures 4, 5, and 6 for typical re- 
sults. We note that it is possible to observe synthetic 
sequences which are remarkably similar to the real se- 
quence: similar amplitude of oscillations, similar fre- 
quency, and similar regularity. In the following part, 
we quantify how frequently such sequences can actually 
be observed. 

The distribution function of the frequencies and peak 
heights of the synthetic sequences were constructed us- 
ing the Kernel Density method [Silverman, 1986; Bear- 
dah, 1995] (Figure 7). The real peak is not far from 
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Figure 1. The fit of a power law with log-periodic os- 
cillations to the normalized cumulative Benioff strain of Figure 3. The Lomb Pcriodogram of the oscillations 
the seismic precursors of the 1989 Loma Pricta earth- shown in Figure 2. 
quake. 
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Figure 2. The oscillations around the power law of 
the normalized cumulative Benioff strain of the seismic 
precursors of the 1989 Loma Prieta earthquake. 



Figure 4. Plot of a quantity similar to that in Figure 
1 from a synthetic sequence. 
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Figure 5. Plot of a quantity similar to that in Figure 
2 from a synthetic sequence. 
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Figure 6. Plot of a quantity similar to that in Figure 
3 from a synthetic sequence. 
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Figure 7. The distribution function of frequencies and 
peak heights from the synthetic sequences. The position 
of the real peak is marked by the vertical line. 

the most probable synthetic peak. From the distribu- 
tion function (Figure 7), we find that the probability 
density function at the most probable synthetic peak is 
proportional to 0.031, while it is proportional to 0.016 
at value of u> and h corresponding to the real peak. The 
ratio of these two quantities is close to one half. If we 
look at the separate distribution of frequencies and peak 
heights (Figures 9 and 10), we see that the frequency 
from the real sequence is slightly higher than the most 
probable synthetic peak, and the peak height of the real 
sequence is almost the most probable synthetic peak 
height. Note that the frequencies from synthetic se- 
quences have a rather narrow distribution, as expected 
from [Huang, 1999; Huang et ai, 2000]. The regularity 
of oscillations observed in the real sequence (Figure 1, 
quantified by the peak height) is not surprising due to 
the strong smoothing effect of integration [Huang, 1999; 
Huang et at, 2000]. When the same analysis procedure 
was applied to both the real data and the synthetic 
data, this kind of regularity is observed in both the real 
data and the synthetic data. 

4. Synthetic Test II 

4.1. Generation of Synthetic Data 

For synthetic tests to be effective, the synthetic 
data should differ from the real ones by only one 
characteristic-the characteristic to be tested. Since in 
our problem we want to test log-periodicity in the oscil- 
lations around a power law, the synthetic data should 
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Figure 8. 2D map view of Figure 7. Each circle rep- 
resents one synthetic peak. The frequency of the real 
peak is marked by the vertical line, peak height by the 
horizontal line. 
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Figure 9. The distribution of the synthetic frequen- 
cies. The vertical line marks the position of the real fre- 
quency. The two diamonds (o) mark the FWHM (full- 
width-half-maximum) of the distribution (the same for 
all subsequent similar plots). 
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Figure 10. The distribution of the synthetic peak 
heights. The vertical line marks the position of the real 
peak height. 

be the same power law but with known noise (could be 
additional data errors or random fluctuations.). Since 
the power law of the cumulative Benioff strain is in fact 
implied by the magnitude distribution and the tempo- 
ral spacing between events and their correlations, we 
have to generate synthetic magnitude m s and time t s 
to get the same power law. t s and m s should be random 
numbers having the same probability distribution as t 
and to, because in our observations we had control over 
neither t nor to. In this light, the analysis in Section 3 
is thus a bit oversimplified; we now refine it. 

One difficulty in generating synthetic t s and m s is 
that the theoretical probability density function (pdf) 
of magnitudes and times for our real data is unknown. 
This problem can be solved by using the empirical pdf 
constructed from the real data. However, we should 
not use the empirical pdf directly, otherwise all features 
of the real data would be reproduced in the synthetic 
data. For example, the empirical distribution of the 
real time sequence (Figure 11) shows some regular os- 
cillations (that could well be genuine physically-based 
log-periodic oscillations) around its general trend. 

If we used exactly this empirical distribution to gen- 
erate the synthetic time sequences, all synthetic se- 
quences would show similar regular oscillations: this 
is of course not appropriate, since what we want to test 
is precisely whether these oscillations are the result of 
noise. Therefore, we decided to use only the general 
trend of the experimental data to generate our synthetic 
samples. This general trend is obtained by smoothing 
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Figure 11. The normalized cumulative number of 
events up to time t of the time sequence of the seismic 
precursors of the 1989 Loma Prieta earthquake (solid 
line connecting circles). The other solid line is the em- 
pirical line repeatedly smoothed (50 times) by 3-point 
moving average. 



the empirical distribution. Three-point moving average 
was applied repeatedly 50 times (10 for smoothing the 
cumulative distribution function of magnitudes). The 
number of times is not crucial as long as the oscilla- 
tions were wiped out. The criteria for our choice are 
closeness to the empirical curve and lack of oscillations. 
Similar considerations were applied to the generation 
of synthetic magnitude sequence. The assumption for 
the general trend was not crucial. A reasonable one, 
close to the empirical cumulative distribution curve but 
without the fluctuations, would suffice. 

The next difficulty is that, for a sequence of times 
and magnitudes generated using this method, there is 
no guarantee that the cumulative Benioff strain will fol- 
low a power law. The time sequence more or less follows 
a power law (we have verified that the cumulative num- 
ber of events of the seismic precursors of the 1989 Loma 
Prieta earthquake is similar to the cumulative Benioff 
strain in power law and log-periodic oscillations), but, 
when combined with the magnitude sequence to con- 
struct the whole Benioff strain curve, there is no ob- 
vious reason why we should always get a power law. 
It is natural to expect that the power law of the real 
sequence comes mainly from the fact that events oc- 
cur more frequently with increasing magnitude (trend 
only) when closer to the main shock. To preserve this 
feature in the synthetic sequences, we decided to re- 



Figure 12. Synthetic magnitude sequence (+). The 
real sequence is plotted with o. 
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Figure 13. Synthetic time sequence (+). The real se- 
quence is plotted with o. 



order the events in the synthetic sequence such 
that the event of the k th magnitude would oc- 
cur at the same position in both the real and 
the synthetic cases (for example, both the real 
sequence and the synthetic sequences have the 
event of the second biggest magnitude being the 
k th one in the sequence). This reordering scheme 
is applied only to the magnitude sequence (time 
sequence is an ordered sequence by definition). 
One example is shown in Figures 12 and 13. 

We performed synthetic tests both with and 
without the reordering scheme. In fact, the re- 
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suits turned out to be almost identical. 

4.2. Extraction of Oscillations 

The method of extracting oscillations from 
the cumulative Benioff strain is slightly different 
from that of Section 3.2, however the difference 
turns out to be insignificant. 

Two ways are possible to obtain the oscilla- 
tions: the first involves extracting the best-fit 
power law from the real data. The drawback 
of this approach is that power law fits are often 
not as stable as fits including the log-periodic 
corrections. Sometimes (around 8% of all cases) 
the fit even converges to a tf smaller than the 
time of the last data point, thus tf — t is neg- 
ative and (tf — i)( m_1 ) is complex since m < 1, 
which is of course unphysical. The advantage of 
this approach is that log-periodicity is not as- 
sumed in the first place. The second approach 
involves extracting the power law obtained from 
a best-fit power law with log-periodic oscilla- 
tions [Johansen and Sornette, 1999a, b; Johansen 
et al, 1999]. The advantage here is that the fit 
always converges well, but now log-periodicity 
is somewhat assumed from the very beginning. 
The more positive view point advocated in [Jo- 
hansen and Sornette, 1999a, b; Johansen et at, 1999] 
to justify this procedure is that fitting with log- 
periodicity allows one to take the most probably 
noise into account [Huang et at, 2000] and thus to 
obtain a good pure power law representation by 
putting the coefficient C = 0. In practice, the 
results of either approach were very similar: in 
the following, we report only the results using 
the second one. 

It is of course crucial to use exactly the same 
analysis procedure for both the real data and 
the synthetic data, otherwise features generated 
by the analysis procedure for the real data may 
not be detected by the synthetic tests. 

The cumulative Benioff strain was first con- 
structed from the magnitude sequence 



75 m., 



(5) 



and then normalized such that e max = 1 (the 
unit was changed without influencing the con- 
clusion). As in [Sornette and Sammis, 1995], the 
cumulative Benioff strain was then fitted to a 
power law with log-periodic oscillations. 
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Figure 14. The fitting of the normalized cumulative 
Benioff strain of the 30 seismic precursors of the 1989 
Loma Prieta earthquake to (1). 

The de-trended data were obtained by [Jo- 
hansen and Sornette, 1999a, b; Johansen et at, 1999] 



detrn = 



B(t f -ty 



(6) 



which should be either noise or pure log-periodic 
cosine according to (1), and then analyzed by the 
procedure explained in Section 2. 

4.3. Results 

4.3.1. The real sequence. The fitting of the 
real data showed good agreement between the 
real data and the theoretical curve (Figure 14). 

There was a peak at lo = 6.1 in the spectrum 
of the de-trended data (Figure 16), close to the 
best-fit parameter lo = 5.7. 

4.3.2. Synthetic sequences. 1000 synthetic se- 
quences (one example in Figures 12 and 13) were 
analyzed using the same procedure as that for 
the real sequence. The cumulative Benioff strain 
of the synthetic sequence showed obvious simi- 
larity to the real data (Figure 17). 

We first checked whether the synthetic data 
gave rise to power laws. The method to do this 
was to fit these data to a power law shape, and 
measure the summed square of error (SSE) from 
the fitting: in about 50% of the cases, the syn- 
thetic sequences had an SSE smaller than that 
of the real sequence, indicating that they were 
roughly as good power laws as the real data. 
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Figure 15. The dc-trended data of the normalized 
Bcnioff strain of the 30 seismic precursors of the 1989 
Loma Prieta earthquake. 



7 




to 



Figure 16. Spectrum of the de-trended data in Figure 
15. 



Figure 17. The fitting of the normalized cumulative 
Bcnioff strain of one synthetic sequence to (1). 

We note here that the ratio of the probabil- 
ity of observing a synthetic sequence with SSE 
similar to that from the real sequence divided 
by the probability of observing a synthetic se- 
quence with the most probable SSE is around 
79%. If we used SSE to measure the goodness of 
the power law model for our synthetic data, the 
real sequence would be very close to the most 
probable synthetic sequence. 

We then compared the fit of a power law with 
log-periodic oscillations to the real and the syn- 
thetic sequences. 

The SSE of the synthetic sequences are in the 
range [0.005, 0.05], centered around the SSE 
from the real sequence (~ 0.013). There are 
around 1/4 of the synthetic sequences that have 
a SSE smaller than that from the real sequence. 
The ratio of the probability of observing a syn- 
thetic sequence with SSE similar to that from 
the real sequence and the probability of observ- 
ing a synthetic sequence with the most proba- 
ble SSE is around 96%. If we use SSE to mea- 
sure the regularity of log-periodic oscillations, 
the real sequence is thus very close to the most 
probable synthetic sequence. 

The tf from the synthetic sequences is dis- 
tributed in [1988,1995] (Figure 18). The ratio 
of the probability of observing a synthetic se- 
quence with tf similar to that from the real se- 
quence divided by the probability of observing a 
synthetic sequence with the most probable tf is 
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Figure 18. The distribution of main shock times (tf 
in (1)) from the synthetic sequences. The vertical line 
marks the value from the real sequence. 

around 95%. Thus the apparent accurate pre- 
diction of the actual main-shock time for the 
1989 Loma Prieta earthquake [Sornette and Sam- 
mis, 1995] might be due to chance. The width of 
the distribution (FWHM, full width at half max- 
imum) is 6.8 years, narrower than that from the 
fit of a pure power law (7.8 years), suggesting 
that log-periodicity may improve power law fits 
by accounting for the most probable noise [Huang 
et at, 2000]. 

There is a well-defined peak in the distribu- 
tion of the frequencies of the log-periodic os- 
cillations from the fitting of the synthetic se- 
quences (Figure 19). The ratio of the proba- 
bility of observing a synthetic sequence with fre- 
quency similar to that from the real sequence di- 
vided by the probability of observing a synthetic 
sequence with the most probable frequency is 
around 81%. 

We summarize the above results in Table 1. 

We now turn to the statistics from the char- 
acterization of log-periodicity by spectral anal- 
ysis of the de-trended data. If we only look at 
the distribution of peak heights, the ratio of the 
probability observing a synthetic peak similar 
to the real peak divided by the probability of 
observing the most probable synthetic peak is 
around 99.9% (Figure 20). 

If we only look at the frequencies, the ratio 
of the probability of observing a synthetic fre- 
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Figure 19. The distribution of frequencies of log- 
periodic oscillations from the fitting of the synthetic 
sequences. The vertical line marks the value from the 
real sequence. 
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Peak height 

Figure 20. The distribution of peak heights from the 
de-trended data. The vertical line marks the value from 
the real sequence. 
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Table 1. Parameters From the Fit of the Synthetic Data With PW (Pure Power Law) and PWLG (Power Law 
With Log-Periodic Oscillations) 







pr a 


pmp 6 


ratio 


mp c 




left 6 


right^ 


FWHM 9 




m 


0.72 


1.99 


0.36 


0.34 


0.69 


0.17 


0.57 


0.40 


PW 


tc 


0.073 


0.13 


0.56 


1994.3 


1988.7 


1988.4 


1996.2 


7.8 




SSE 


20.0 


24.9 


0.79 


0.028 


0.037 


0.016 


0.052 


0.036 



tc 

PWLG SSE 

w 



1.94 2.23 

0.13 0.14 

50.0 51.8 

0.21 0.26 

2.84 6.71 



0.87 0.45 

0.95 1990.5 

0.96 0.015 

0.81 4.28 

0.42 -0.039 



0.52 0.25 

1989.8 1988.1 

0.017 0.0092 

5.67 3.16 

-0.083 -0.078 



0.63 0.38 

1994.8 6.8 

0.026 0.014 

6.99 3.83 

0.075 0.15 



"Value of the probability density function at a synthetic peak similar to the real peak. 
''Value of the probability density function at the most probable synthetic peak. 
c The most probable value from synthetic data. 
d The value from real data. 

6 Value at the left point of the FWHM of the distribution of synthetic values. 
^Value at the right point of the FWHM of the distribution of synthetic values. 
9 Full width at Half Maximum of a peak. 
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Figure 21. The distribution of frequencies from the 
de-trended data. The vertical line marks the value from 
the real sequence. 



quency similar to that of the real peak divided 
by the probability of observing the most prob- 
able synthetic frequency is around 56% (Figure 
21). 

When we look at the joint distribution of peak 



heights and frequencies, the ratio of the proba- 
bility for observing a peak similar to the real 
peak divided by the probability of observing the 
most probable synthetic peak is about 56% (Fig- 
ures 22 and 23). 




Figure 22. The distribution of peak heights and fre- 
quencies from the de-trended data. 

Not using the reordering scheme did not sig- 
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Table 2. Synthetic Tests of the Log-Periodicity Ob- 
served in the Benioff Strain of the Seismic Precursors 
of the 1989 Loma Prieta Earthquake 



Figure 23. The 2D map view of Figure 22. The ver- 
tical and horizontal lines mark the values from the real 
sequence. 



nificantly change the above results, except that, 
for some synthetic sequences, the power law was 
not good. 

Also, recall that the foregoing results involved 
fitting data to a power law with log-periodic os- 
cillations, then de-trending. Fitting data to a 
pure power law instead produced very similar 
results. 

We summarize all the results in the previous 
sections in Table 2. 

5. Discussion 

These synthetic tests were performed to de- 
termine whether it is possible to observe the re- 
ported 

log-periodicity [Sornette and Sammis, 1995] from 
integrated noise in power laws, and if possible, 
how big the probability is. We found that, if we 
use the highest peak of the spectrum of the oscil- 
lations around the power law of the cumulative 
Benioff strain to quantify the log-periodicity in 
the oscillations, peaks similar to the peak ob- 
served from the real sequence (the real peak) 
were indeed frequently observed from the syn- 
thetic sequences. The odds of observing a syn- 
thetic peak similar to the real peak is more than 
50% of the odds of observing the most probable 
synthetic peak. 

It is reasonable to use the highest peak in the 







EOPW a 




De-trended data 




pr 6 


pmp c 


ratio 


pr 


pmp ratio 


UJ 

h 

(ui, h) 


0.11 
0.19 
0.027 


0.11 
0.20 
0.031 


0.99 
0.93 
0.89 


0.12 
0.12 
0.016 


0.22 0.56 
0.12 1.00 
0.029 0.56 


(w, h) d 


0.020 


0.025 


0.81 


0.16 


0.23 0.73 



"Extracted oscillations using the best-fit pure power law. 

6 Value of the probability density function at a synthetic 
peak similar to the real peak. 

c Value of the probability density function at the most 
probable synthetic peak. 

d No reordering. 



spectrum of a signal to quantify the most sig- 
nificant frequency component in the signal. The 
position of the peak is the frequency (w), and 
the peak height (h) quantifies the regularity of 
that frequency component. If two signals have 
similar peaks in their spectrums, they must have 
oscillations of similar frequency and regularity. 

Peaks similar to the real peak were observed 
frequently from the synthetic sequences. To 
quantify this frequency of observation, we con- 
structed the probability density function p(w, h) 
of the synthetic peaks in the space of (w,h). 
From p(u>,h), we were able to obtain the prob- 
ability of observing a synthetic peak similar to 
the real peak (uj r , h r ), that is p(uj r , h r ) duidh. This 
probability might be a small number, which is 
meaningful only when compared with the prob- 
ability of observing the most probable synthetic 
peak, which is p(cu mp , h mp ) duj dh. The ratio of the 
two probabilities quantifies well how frequently 
we observe the feature from synthetic data. 

The mechanism at the origin of log-periodicity 
in the synthetic data sets has been discussed in 
[Huang, 1999; Huang et al, 2000]. Briefly, log- 
periodicity results from the fact that taking the 
cumulative of a power law involves a low pass 
filtering step (reddening of the noise) which, in 
a finite sample, creates a maximum in the spec- 
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trum leading to a most probable log-frequency 
corresponding approximately to 1.5 cycles over 
the full sampled interval. 

We looked into two quantities for log-periodicity 
in the oscillations around the power law of the 
cumulative Benioff strain. The extracted oscilla- 
tions are the difference between the data and the 
best-fit power law. The de-trended data were 
obtained using (6). For both of them, the ra- 
tio of the two probabilities is bigger than 50%, 
which means that it is not only possible to ob- 
serve that kind of log-periodicity in synthetic 
data, but also highly probable. 

Our synthetic events and the real events have 
the same distribution in time and magnitude, 
and they were analyzed in exactly the same way. 
Since discrete scale invariance is not present in 
the synthetic data, the log-periodicity observed 
in the real sequence cannot be used as evidence 
for discrete scale invariance. 

In fact, even the power law used as evidence 
of ordinary scale invariance could also be ex- 
plained by other mechanisms. Indeed, the cu- 
mulative Benioff strains of the synthetic se- 
quences in Section 4 do follow power laws sim- 
ilar to that of the real sequence. For the real 
sequence, the cumulative distribution function 
of event times is not significantly different from 
a straight line. The cumulative distribution of 
moments is not a power law either (S shape in- 
stead of the usual power law shape) (this is an 
ad hoc statement. The reason is that for this se- 
quence of magnitudes, the number of data points 
is small (only 30) and the magnitude cut-off is 
very high (5.0). So even if in general the moment 
distribution is a power law, large statistical fluc- 
tuations may make the moment distribution of 
this sequence non power law. We use an em- 
pirical distribution function instead of the usual 
power law assumption to avoid dependence on 
that assumption. In fact our method will still 
be valid no matter what the underlying distri- 
bution is.). But for the real sequence, mag- 
nitudes tends to be bigger when closer to the 
main shock, especially for the last several events 
[Jones, 1994]. Since small difference in magni- 
tudes will be translated into quite big difference 
in the cumulative Benioff strain, the last several 
events would make the would-be linear trend 
bend upward which happens to be well described 
by a power law of small exponent. Since this in- 



creasing tendency of magnitudes was preserved 
in the generation of synthetic magnitudes, power 
laws are also good for describing the synthetic 
data. In fact, without the re-ordering scheme 
to preserve that feature of magnitudes, we were 
still able to obtain similar results. As long as the 
magnitudes are not exactly uniform, the largest 
magnitude will bend the would-be linear trend 
somewhere, and power law with small exponents 
can still describe the data well. 

The improved accuracy of the prediction of 
the main shock time of the 1989 Loma Prieta 
earthquake by consideration of log-periodic os- 
cillations was regarded as a evidence support- 
ing the hypothesis in [Sornette and Sammis, 1995]. 
However, from the study presented here, it is 
not rare to obtain tf near that value from our 
synthetic data containing events in the range of 
[1940,1988]. If we use a power law to describe 
the data, by definition tf should be slightly big- 
ger than the time of the last data point. In- 
deed, from our simulations, we found that tf is 
distributed in [1988,1995], and the chance of ob- 
serving a synthetic tf similar to the real tf is 
around 95% of the probability of observing the 
most probable synthetic tf. The point is, given 
a sequence of events in that time range and a 
power law assumption, that kind of tf is not hard 
to find. 

It is important to emphasize that the present 
study does not alter the usefulness of studying 
seismic precursors. However, the physical in- 
terpretation associated with the observations in 
[Sornette and Sammis, 1995] does not seem to be 
warranted, at least on the face of the Loma Pri- 
eta case only. However, as pointed out in [Huang 
et at, 2000] and also found in this paper, log- 
periodic oscillations are robust features of power 
laws. The present analysis as well as those given 
in [Huang et al, 2000] suggests that, whatever 
their origin (noise or physical), they might still 
be used to improve the prediction of the main 
event. This is clearly what we observe in our 
synthetic tests performed on pure power laws 
without log-periodicity: a power law fit with 
log-periodicity has a better estimate for tf than 
a pure power law without the log-periodic os- 
cillations. The reason may be that, by fitting 
the most probable form of noise, the fit is more 
stable. It seems worthwhile to investigate this 
possibility further in future studies. 
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